186 research outputs found

    Using Predicted Bioactivity Profiles to Improve Predictive Modeling

    Get PDF
    Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles

    Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning

    Get PDF
    Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox

    LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity – Application to the Tox21 and Mutagenicity Datasets

    Get PDF
    Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster-speed and lower-cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm inherited its high predictivity but resolved its scalability and long computational time by adopting leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity datasets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity datasets. We recommend LightGBM for applications in in silico safety assessment and also in other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related endpoints of large compound libraries present in the pharmaceutical and chemical industry

    Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction

    Get PDF
    High-throughput screening, where thousands of molecules rapidly can be assessed for activity against a protein, has been the dominating approach in drug discovery for many years. However, these methods are costly and require much time and effort. In order to suggest an improvement to this situation, in this study, we apply an iterative screening process, where an initial set of compounds are selected for screening based on molecular docking. The outcome of the initial screen is then used to classify the remaining compounds through a conformal predictor. The approach was retrospectively validated using 41 targets from the Directory of Useful Decoys, Enhanced (DUD-E), ensuring scaffold diversity among the active compounds. The results show that 57% of the remaining active compounds could be identified while only screening 9.4% of the database. The overall hit rate (7.6%) was also higher than when using docking alone (5.2%). When limiting the search to the top scored compounds from docking, 39.6% of the active compounds could be identified, compared to 13.5% when screening the same number of compounds solely based on docking. The use of conformal predictors also gives a clear indication of the number of compounds to screen in the next iteration. These results indicate that iterative screening based on molecular docking and conformal prediction can be an efficient way to find active compounds while screening only a small part of the compound collection.F.S. acknowledges the Swedish Pharmaceutical Society for financial support. The research at Swetox (UN) was supported by Stockholm County Council, Knut & Alice Wallenberg Foundation, and Swedish Research Council FORMAS

    Maximizing gain in high-throughput screening using conformal prediction

    Get PDF
    Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8–10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.The research at Swetox (UN) was supported by Knut and Alice Wallenberg Foundation and Swedish Research Council FORMAS. AMA was supported by AstraZeneca

    Identifying novel inhibitors for hepatic organic anion transporting polypeptides by machine learning-based virtual screening

    Get PDF
    Integration of statistical learning methods with structure-based modeling approaches is a contemporary strategy to identify novel lead compounds in drug discovery. Hepatic organic anion transporting polypeptides (OATP1B1, OATP1B3, and OATP2B1) are classical off-targets, and it is well recognized that their ability to interfere with a wide range of chemically unrelated drugs, environmental chemicals, or food additives can lead to unwanted adverse effects like liver toxicity and drug-drug or drug-food interactions. Therefore, the identification of novel (tool) compounds for hepatic OATPs by virtual screening approaches and subsequent experimental validation is a major asset for elucidating structure-function relationships of (related) transporters: they enhance our understanding about molecular determinants and structural aspects of hepatic OATPs driving ligand binding and selectivity. In the present study, we performed a consensus virtual screening approach by using different types of machine learning models (proteochemometric models, conformal prediction models, and XGBoost models for hepatic OATPs), followed by molecular docking of preselected hits using previously established structural models for hepatic OATPs. Screening the diverse REAL drug-like set (Enamine) shows a comparable hit rate for OATP1B1 (36% actives) and OATP1B3 (32% actives), while the hit rate for OATP2B1 was even higher (66% actives). Percentage inhibition values for 44 selected compounds were determined using dedicated in vitro assays and guided the prioritization of several highly potent novel hepatic OATP inhibitors: six (strong) OATP2B1 inhibitors (IC50 values ranging from 0.04 to 6 μM), three OATP1B1 inhibitors (2.69 to 10 μM), and five OATP1B3 inhibitors (1.53 to 10 μM) were identified. Strikingly, two novel OATP2B1 inhibitors were uncovered (C7 and H5) which show high affinity (IC50 values: 40 nM and 390 nM) comparable to the recently described estrone-based inhibitor (IC50 = 41 nM). A molecularly detailed explanation for the observed differences in ligand binding to the three transporters is given by means of structural comparison of the detected binding sites and docking poses.Medicinal Chemistr

    Computational approaches for modeling human intestinal absorption and permeability

    Get PDF
    Human intestinal absorption (HIA) is an important roadblock in the formulation of new drug substances. Computational models are needed for the rapid estimation of this property. The measurements are determined via in vivo experiments or in vitro permeability studies. We present several computational models that are able to predict the absorption of drugs by the human intestine and the permeability through human Caco-2 cells. The training and prediction sets were derived from literature sources and carefully examined to eliminate compounds that are actively transported. We compare our results to models derived by other methods and find that the statistical quality is similar. We believe that models derived from both sources of experimental data would provide greater consistency in predictions. The performance of several QSPR models that we investigated to predict outside the training set for either experimental property clearly indicates that caution should be exercised while applying any of the models for quantitative predictions. However, we are able to show that the qualitative predictions can be obtained with close to a 70% success rate
    • …
    corecore